Video-Based Action Recognition Using Rate-Invariant Analysis of Covariance Trajectories
نویسندگان
چکیده
Statistical classification of actions in videos is mostly performed by extracting relevant features, particularly covariance features, from image frames and studying time series associated with temporal evolutions of these features. A natural mathematical representation of activity videos is in form of parameterized trajectories on the covariance manifold, i.e. the set of symmetric, positive-definite matrices (SPDMs). The variable execution-rates of actions implies variable parameterizations of the resulting trajectories, and complicates their classification. Since action classes are invariant to execution rates, one requires rate-invariant metrics for comparing trajectories. A recent paper represented trajectories using their transported square-root vector fields (TSRVFs), defined by parallel translating scaled-velocity vectors of trajectories to a reference tangent space on the manifold. To avoid arbitrariness of selecting the reference and to reduce distortion introduced during this mapping, we develop a purely intrinsic approach where SPDM trajectories are represented by redefining their TSRVFs at the starting points of the trajectories, and analyzed as elements of a vector bundle on the manifold. Using a natural Riemannain metric on vector bundles of SPDMs, we compute geodesic paths and geodesic distances between trajectories in the quotient space of this vector bundle, with respect to the reparameterization group. This makes the resulting comparison of trajectories invariant to their re-parameterization. We demonstrate this framework on two applications involving video classification: visual speech recognition or lip-reading and hand-gesture recognition. In both cases we achieve results either comparable to or better than the current literature.
منابع مشابه
Action Change Detection in Video Based on HOG
Background and Objectives: Action recognition, as the processes of labeling an unknown action of a query video, is a challenging problem, due to the event complexity, variations in imaging conditions, and intra- and inter-individual action-variability. A number of solutions proposed to solve action recognition problem. Many of these frameworks suppose that each video sequence includes only one ...
متن کاملVideo Covariance Matrix Logarithm for Human Action Recognition in Videos
In this paper, we propose a new local spatiotemporal descriptor for videos and we propose a new approach for action recognition in videos based on the introduced descriptor. The new descriptor is called the Video Covariance Matrix Logarithm (VCML). The VCML descriptor is based on a covariance matrix representation, and it models relationships between different low-level features, such as intens...
متن کاملView-Invariance in Action Recognition
Automatically understanding human actions using motion trajectories derived from video sequences is a very challenging problem. Since an action takes place in 3-D, and is projected on 2-D image, depending on the viewpoint of the camera, the projected 2-D trajectory may vary. Therefore, the same action may have very different trajectories, and trajectories of different actions may look the same....
متن کاملCovariance of Motion and Appearance Featuresfor Spatio Temporal Recognition Tasks
In this paper, we introduce a novel descriptor for employing covariance of motion and appearance features for human action and gesture recognition. In our approach, we compute kinematic features from optical flow and first and second-order derivatives of intensities to represent motion and appearance respectively. These features are then used to construct covariance matrices which capture joint...
متن کاملVideo-based face recognition in color space by graph-based discriminant analysis
Video-based face recognition has attracted significant attention in many applications such as media technology, network security, human-machine interfaces, and automatic access control system in the past decade. The usual way for face recognition is based upon the grayscale image produced by combining the three color component images. In this work, we consider grayscale image as well as color s...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1503.06699 شماره
صفحات -
تاریخ انتشار 2015